Skip to content

docs: show struct-returning aggregate window metadata pattern#23248

Merged
alamb merged 9 commits into
apache:mainfrom
ametel01:issue-16453-struct-returning-aggregate
Jul 1, 2026
Merged

docs: show struct-returning aggregate window metadata pattern#23248
alamb merged 9 commits into
apache:mainfrom
ametel01:issue-16453-struct-returning-aggregate

Conversation

@ametel01

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

Issue #16453 asks how an extension can expose window metadata such as window_start, window_end, and window_duration when grouping with a custom window assignment function.

The earlier nullary aggregate UDF direction in #23038 turned out to be the wrong abstraction: generic aggregate accumulators need real input arrays for row context and normal multi-stage aggregate execution. In the issue and PR discussion, @alamb suggested modeling this as a selector-style aggregate that receives real input columns and returns a struct containing both the metadata and aggregate result.

What changes are included in this PR?

This PR documents and tests that struct-returning aggregate pattern:

  • Adds UDAF integration tests for an augmented_avg(time, value) aggregate returning a struct with window_start, window_end, window_duration, and avg_value.
  • Shows direct SQL field projection from the returned struct, for example augmented_avg(time, value)['window_start'].
  • Adds a test-only session_window(time, INTERVAL ...) grouping UDF to demonstrate how an extension can assign rows to windows while the aggregate derives metadata from real input columns.
  • Adds library user guide documentation for returning multiple values from an aggregate UDF.
  • Adds a runnable datafusion-examples UDF example for the same pattern.

This intentionally does not add nullary aggregate UDF support and does not add first-class planner support for bare virtual columns such as SELECT window_start.

Are these changes tested?

Yes. I ran:

  • cargo run --example udf -- struct_udaf
  • ci/scripts/check_examples_docs.sh
  • cargo test -p datafusion --test user_defined_integration test_augmented_avg
  • cargo fmt --all -- --check
  • ci/scripts/doc_prettier_check.sh
  • git diff --check
  • cargo clippy --all-targets --all-features -- -D warnings
  • ./dev/rust_lint.sh

Are there any user-facing changes?

Yes. This adds documentation and an example for aggregate UDF authors. There are no public API changes.

@github-actions github-actions Bot added documentation Improvements or additions to documentation core Core DataFusion crate labels Jun 30, 2026
@ametel01

Copy link
Copy Markdown
Contributor Author

CI is green now. @alamb when you have time, could you take a look at whether this matches the struct-returning aggregate / selector-style direction you suggested for #16453?

@alamb alamb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @ametel01 -- this looks great

I think we can / should simplify this example somewhat (I left suggestions) but I also think we can potentially simplify as a follow on PR as well

Comment thread datafusion-examples/examples/udf/struct_returning_udaf.rs
Comment thread datafusion-examples/examples/udf/struct_returning_udaf.rs
Comment thread datafusion-examples/examples/udf/struct_returning_udaf.rs Outdated
Comment thread datafusion-examples/examples/udf/struct_returning_udaf.rs Outdated
Comment thread datafusion-examples/examples/udf/struct_returning_udaf.rs
Comment thread datafusion/core/tests/user_defined/user_defined_aggregates.rs Outdated
Comment thread docs/source/library-user-guide/functions/adding-udfs.md Outdated
Comment thread docs/source/library-user-guide/functions/adding-udfs.md Outdated
Comment thread docs/source/library-user-guide/functions/adding-udfs.md Outdated
@ametel01 ametel01 force-pushed the issue-16453-struct-returning-aggregate branch from 17bba51 to e928e25 Compare June 30, 2026 19:18
@github-actions github-actions Bot removed the core Core DataFusion crate label Jun 30, 2026
@ametel01

ametel01 commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

Thanks again for the suggestions. I updated the example to use date_bin, removed the duplicated integration test copy, simplified the docs wording, and added comments to the example.

@alamb alamb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me -- thank you @ametel01

@alamb alamb added this pull request to the merge queue Jul 1, 2026
@alamb

alamb commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Love it when we can close a ticket with some more documentation / example

Merged via the queue into apache:main with commit 0d2c791 Jul 1, 2026
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support projecting columns that do not exist in the table

2 participants